Upgrade to PRO for Only $50/Yearโ€”Limited-Time Offer! ๐Ÿ”ฅ

Population Dynamics Meet Network Optimization

1.6k

Population Dynamics Meet Networkย Optimization

Presented at NecSys 2025

Avatar for Florian Dรถrfler

Florian Dรถrfler

June 04, 2025
Tweet

More Decks by Florian Dรถrfler

Transcript

  1. Population Dynamics Meet Network Optimization Florian Dรถrfler A Decision-Dependent Stochastic

    Formulation IFAC Conference on Network Systems 2025 Joint work with Zhiyu He, Saverio Bolognani, & Michael Muehlebach
  2. Decision-making in probability spaces The state / decision-variable is a

    probability measure when โ€ข using first-principle physics (e.g., quantum or statistical mechanics) โ€ข taking a probabilistic view of the world (e.g., Bayesian modeling) โ€ข distributional robustification (DRO) accounting for distribution shifts โ€ข today: for large populations of agents, e.g., in mean-field settings 2
  3. Preview example: Affinity Maximization in a Polarized Population 3 opportunistic

    political party promoting an ideology large-scale population of individuals
  4. ๐‘๐‘˜+1 = normalize ๐œ† โ‹… ๐‘๐‘˜ + 1 โˆ’ ๐œ†

    โ‹… ๐‘0 + ๐œŽ โ‹… ๐‘๐‘˜ โŠค๐‘ข ๐‘ข Polarized population model 4 utility individual position ideology Model [Gaitonde et al., โ€™21; Dean et al., โ€™22; Hazล‚a et al., โ€˜24] initial position Steady state given a fixed ideological position ๐‘ข ๐‘ss = โ„Ž(๐‘0 , ๐‘ข) perpendicular angle โ†’ not influenced ๐‘0 ๐‘ss ๐‘ข acute angle โ†’ influenced ๐‘ข ๐‘0 ๐‘ss obtuse angle โ†’ anti-influenced ๐‘ข ๐‘0 ๐‘ss ๐‘ข ๐‘๐‘˜ biased assimilation ๐‘๐‘˜ = ๐‘ข = 1 ๐‘๐‘˜ T๐‘ข
  5. Modeling in probability space 5 Distributional modeling for large-scale population

    without identifiers Model ๐œ‡ss ๐œ‡ss (๐‘ข) = โ„Ž(๐‘ข,โ‹…)# ๐œ‡0 Steady state given a fixed ideological position ๐‘ข ๐‘ss = โ„Ž(๐‘0 , ๐‘ข) pushforward ๐‘๐‘˜+1 = normalize ๐œ† โ‹… ๐‘๐‘˜ + 1 โˆ’ ๐œ† โ‹… ๐‘0 + ๐œŽ โ‹… ๐‘๐‘˜ โŠค๐‘ข ๐‘ข utility individual position ideology initial position biased assimilation ๐‘0 โˆผ ๐œ‡0 ๐‘0 sampled from initial distribution ๐œ‡0 ๐œ‡0
  6. s. t. ๐œ‡ss (๐‘ข) = โ„Ž(๐‘ข,โ‹…)# ๐œ‡0 max โ€–๐‘ขโ€–โ‰ค1 ๐”ผ๐‘โˆผ๐œ‡ss(๐‘ข)

    [๐‘โŠค๐‘ข] Affinity maximization 6 Goal: population-wide affinity optimization Distributional steady state for large-scale population Challenges: โ€ข nonlinear / โˆž-dimensional map โ„Ž โ€ข unknown map โ„Ž & distribution ๐œ‡0 โ€ข system is not in steady state ๐œ‡ss โ€ข decision-dependence ๐”ผ๐‘โˆผ๐œ‡ss(๐‘ข) decision-maker can at best sample from the currently observed ๐œ‡๐‘˜ ๐œ‡ss (๐‘ข) = โ„Ž(๐‘ข,โ‹…)# ๐œ‡0
  7. 7 Feedback loop non-stationary distributions unknown & nonlinear dynamics convergence

    optimality random variable ๐‘ decision-making ๐œ‡๐‘˜ decision dependence population dynamics ๐œ‡๐‘˜+1 with distribution ๐œ‡๐‘˜ (approximated by samples) ๐‘ข๐‘˜+1 = arg max โ€–๐‘ขโ€–โ‰ค1 ๐”ผ๐‘โˆผ๐œ‡๐‘˜ [๐‘โŠค๐‘ข]
  8. Contents & take-home messages 8 โ‘  distribution dynamics โ‘ข finite-sample

    generalization โ‘ก online stochastic algorithm decision take into account & control even for fixed sample distribution adapt to & shape distributions ๐œ‡๐‘˜ ๐œ‡๐‘˜+1 โ‘ฃ case studies affinity maximization & recommender system
  9. 9 (re)optimize stochastic decision-making deciding on ๐‘ข ๐œ‡๐‘˜ ๐œ‡๐‘˜+1 Broader

    picture: randomness is common (re)sample & random variable ๐‘ follows ๐œ‡๐‘˜ decision variable random variable performance objective positions ideology demands price optimal decision & so is feedback endogeneous uncertainty: ๐œ‡๐‘˜ depends on ๐‘ข min ๐‘ข ๐”ผ๐‘โˆผ๐œ‡๐‘˜ [ฮฆ(๐‘ข; ๐‘)]
  10. s.t. ๐œ‡ss (๐‘ข) = โ„Ž(๐‘ข,โ‹…)# ๐œ‡๐‘‘ min ๐‘ข ๐”ผ๐‘โˆผ๐œ‡ss ๐‘ข

    ฮฆ ๐‘ข, ๐‘ Problem formulation 10 decision distribution shift steady-state distribution steady-state map utility decision-dependent problem (differentiable in ๐‘ข & Lipschitz in ๐‘‘) (differentiable) ๐œ‡๐‘˜ ๐œ‡๐‘˜โˆ’1 ๐‘ข๐‘˜ โ†’ ๐œ‡ss (๐‘ข) = โ„Ž(๐‘ข,โ‹…)# ๐œ‡๐‘‘ ๐‘ = โ„Ž(๐‘ข, ๐‘‘) ๐‘๐‘˜ = ๐‘“ ๐‘๐‘˜โˆ’1 , ๐‘ข๐‘˜ , ๐‘‘ , ๐‘0 โˆผ ๐œ‡0 , ๐‘‘ โˆผ ๐œ‡๐‘‘ distribution dynamics initial state decision exogenous disturbance (contracting in ๐‘ & Lipschitz in (๐‘‘, ๐‘ข)) (finite absolute moments)
  11. Ingredient: online feedback optimization 11 optimization problem s.t. ๐‘ =

    ๐ป๐‘ข ๐‘ข + ๐ป๐‘‘ ๐‘‘ min ๐‘ข, ๐‘ ฮฆ ๐‘ข, ๐‘ gradient descent requiring only steady-state sensitivity ๐ป๐‘ข ๐‘ข๐‘˜+1 = ๐‘ข๐‘˜ โˆ’ ๐œ‚ โ‹… ๐ผ ๐ป๐‘ข T ยท โˆ‡ฮฆ ๐‘ข๐‘˜ , ๐’‘๐’Œ linear time-invariant dynamics ๐‘๐‘˜+1 = ๐ด๐‘๐‘˜ + ๐ต๐‘ข๐‘˜ + ๐ธ๐‘‘ const. disturbance ๐‘‘ & steady-state map ๐‘ss = ๐ผ โˆ’ ๐ด โˆ’1๐ต๐‘ข + ๐ผ โˆ’ ๐ด โˆ’1๐ธ๐‘‘ ๐ป๐‘ข ๐ป๐‘‘ ๐ต ๐ธ ๐ด ๐‘งโˆ’1 ๐œ‚ โˆ‡๐‘ข ๐œ™ ๐ป๐‘ข T โˆ‡๐‘ฆ ๐œ™ ๐‘งโˆ’1 ๐‘ข๐‘˜ + + + ๐‘‘ + - - ๐‘๐‘˜ ๐’‘๐’Œ evaluating measurement
  12. 12 Can we copy this recipe here ? Ideally evaluate

    exact gradients of ๐”ผ๐‘โˆผ๐œ‡ss ๐‘ข [ฮฆ(๐‘ข, ๐‘)] โˆ‡เทฉ ฮฆ ๐‘ข๐‘˜ = ๐”ผ๐‘‘โˆผ๐œ‡๐‘‘ [โˆ‡ฮฆ(๐‘ข๐‘˜ , โ„Ž(๐‘ข๐‘˜ , ๐‘‘))] = ๐”ผ๐‘‘โˆผ๐œ‡๐‘‘ [โˆ‡๐‘ข ฮฆ(๐‘ข๐‘˜ , โ„Ž(๐‘ข๐‘˜ , ๐‘‘)) + โˆ‡๐‘ข โ„Ž ๐‘ข๐‘˜ , ๐‘‘ ๐›ป๐‘ ฮฆ(๐‘ข๐‘˜ , ๐‘)|๐‘=โ„Ž(๐‘ข๐‘˜,๐‘‘) ] = ๐”ผ(๐‘,๐‘‘)โˆผ๐›พss(๐‘ข๐‘˜) [โˆ‡๐‘ข ฮฆ(๐‘ข๐‘˜ , ๐‘) + โˆ‡๐‘ข โ„Ž(๐‘ข๐‘˜ , ๐‘‘) โˆ‡๐‘ ฮฆ(๐‘ข๐‘˜ , ๐‘)] steady state induced by ๐‘‘ โˆผ ๐œ‡๐‘‘ & ๐‘ โˆผ โ„Ž(๐‘ข,โ‹…)# ๐œ‡๐‘‘ chain rule & law of total derivative conditions on ฮฆ allow swapping โˆ‡ & ๐”ผ Challenges hard to evaluate ๐”ผ (integral) no access to the steady state online decision-making! use current samples from ๐œ‡๐‘˜ = ๐”ผ๐‘‘โˆผ๐œ‡๐‘‘ ฮฆ ๐‘ข, โ„Ž ๐‘ข, ๐‘‘ โ‰œ เทฉ ฮฆ(๐‘ข)
  13. โˆ‡เทฉ ฮฆ(๐‘ข๐‘˜ ) = ๐”ผ(๐‘,๐‘‘)โˆผ๐›พss(๐‘ข๐‘˜) [โˆ‡๐‘ข ฮฆ(๐‘ข๐‘˜ , ๐‘) +

    โˆ‡๐‘ข โ„Ž(๐‘ข๐‘˜ , ๐‘‘)โˆ‡๐‘ ฮฆ(๐‘ข๐‘˜ , ๐‘)] เทก โˆ‡๐‘˜ เทฉ ฮฆ(๐‘ข๐‘˜ ) = ๐”ผ(๐‘,๐‘‘)โˆผ๐›พ๐‘˜ [โˆ‡๐‘ข ฮฆ(๐‘ข๐‘˜ , ๐‘) + โˆ‡๐‘ข โ„Ž(๐‘ข๐‘˜ , ๐‘‘)โˆ‡๐‘ ฮฆ(๐‘ข๐‘˜ , ๐‘)] Online stochastic algorithm 13 adaption anticipation & steering current steady-state โ€ฆ a finite-sample approximation of ๐œ‡ss (๐‘ข๐‘˜ ) ๐œ‡๐‘˜ ๐œ‡๐‘˜โˆ’1 ๐‘ข๐‘˜+1 = ๐‘ข๐‘˜ โˆ’ ๐œ‚ โ‹… 1 ๐‘›mb เท ๐‘–=1 ๐‘›mb โˆ‡๐‘ข ฮฆ ๐‘ข๐‘˜ , ๐‘๐‘˜ ๐‘– + โˆ‡๐‘ข โ„Ž ๐‘ข๐‘˜ , ๐‘‘๐‘– โˆ‡๐‘ ฮฆ ๐‘ข๐‘˜ , ๐‘๐‘˜ ๐‘– โ€ฆ which again approximates the true gradient เทก โˆ‡mb ๐‘˜ เทฉ ฮฆ (๐‘ข๐‘˜ ) mini batch composite stochastic gradient
  14. ๐‘Š1 ๐œ‡, ๐œˆ = inf ๐›พโˆˆฮ“ ๐œ‡,๐œˆ เถฑ ๐’ณร—๐’ด ๐‘

    ๐‘ฅ, ๐‘ฆ d๐›พ ๐‘ฅ, ๐‘ฆ Ingredient: Wasserstein distance 14 ๐œ‡(๐‘ฅ) Metric measuring the discrepancy between distributions ๐œ‡ & ๐œˆ cost of moving mass ๐œˆ(๐‘ฆ) transport plan ๐›พ(๐‘ฅ, ๐‘ฆ) minimum cost of transporting ๐œ‡ onto ๐œˆ Why Wasserstein? general distributions with disjoint supports + tractable theory price ๐‘(๐‘ฅ, ๐‘ฆ) joint distribution with marginals ๐œ‡(๐‘ฅ) & ๐œˆ(๐‘ฆ) ๐’ณ ๐’ด
  15. Contraction of distribution shifts under decisions 15 โ†’ contracting under

    the assumption of Lipschitz dynamics: ๐‘Š1 ๐œ‡๐‘˜ , ๐œ‡ss ๐‘ข๐‘˜ โ‰ค เถฑ ๐‘ ๐‘˜ ๐‘0 , ๐‘ข0:๐‘˜ , ๐‘‘ โˆ’ โ„Ž ๐‘ข๐‘˜ , ๐‘‘ d๐œ‡0 d๐œ‡๐‘‘ closeness between ๐œ‡๐‘˜ & ๐œ‡ss (๐‘ข๐‘˜ ) when ๐‘ข๐‘˜ is close to ๐‘ข๐‘˜โˆ’1 ๐œ‡๐‘˜ weakly converges to ๐œ‡ss (๐‘ข๐‘˜ ) bounded Wasserstein distance ๐‘‰๐‘˜ โ‰ค ๐ฟ ๐‘“ ๐‘ ๐‘‰๐‘˜โˆ’1 + ๐‘๐‘œ๐‘›๐‘ ๐‘ก. โ‹… โ€–๐‘ข๐‘˜ โˆ’ ๐‘ข๐‘˜โˆ’1 โ€– โˆˆ (0, 1) drift of decisions implication เทก โˆ‡๐‘˜ เทฉ ฮฆ ๐‘ข๐‘˜ close to โˆ‡ เทฉ ฮฆ(๐‘ข๐‘˜ ) เทก โˆ‡๐‘˜ เทฉ ฮฆ ๐‘ข๐‘˜ โˆ’ โˆ‡เทฉ ฮฆ(๐‘ข๐‘˜ ) โ‰ค ๐‘๐‘œ๐‘›๐‘ ๐‘ก. โ‹… ๐‘Š1 ๐œ‡๐‘˜ , ๐œ‡ss ๐‘ข๐‘˜ ๐œ‡๐‘˜ ๐œ‡ss (๐‘ข๐‘˜ ) involves โ‰œ ๐‘‰๐‘˜
  16. Optimality in expectation 16 For any window of length ๐‘‡,

    with the step size ๐œ‚ = ๐’ช(1/ ๐‘‡) convergence measure average expected second moments of gradients dynamics & composite structure same (best) ๐’ช 1/ ๐‘‡ rate as SGD for static & non-convex problem = ๐’ช 1 ๐‘‡ โ‰ค 8 เทฉ ฮฆ ๐‘ข0 โˆ’ เทฉ ฮฆโˆ— ๐œ‚๐‘‡ + ๐‘๐‘œ๐‘›๐‘ ๐‘ก.โ‹… ๐œ‚ ๐‘›mb + ๐’ช 1 ๐‘‡ 1 ๐‘‡ เท ๐‘˜=0 ๐‘‡โˆ’1 ๐”ผ โ€–โˆ‡เทฉ ฮฆ(๐‘ข๐‘˜ )โ€–2 naรฏve vanilla gradient (no anticipation) rate is slower & convergence only up to non-vanishing bias w.r.t. rand. in ๐‘ข๐‘˜ non- convex
  17. Population convergence 17 For any window of length ๐‘‡, with

    the step size ๐œ‚ = ๐’ช(1/ ๐‘‡) 1 ๐‘‡ เท ๐‘˜=0 ๐‘‡โˆ’1 ๐”ผ[๐‘Š1 (๐œ‡๐‘˜ , ๐œ‡ss(๐‘ข๐‘˜ ))] โ‰ค ๐‘๐‘œ๐‘›๐‘ ๐‘ก.โ‹… เทฉ ฮฆ ๐‘ข0 โˆ’ เทฉ ฮฆโˆ— ๐œ‚๐‘‡ + ๐‘๐‘œ๐‘›๐‘ ๐‘ก.โ‹… ๐œ‚ ๐‘›mb + ๐’ช 1 ๐‘‡ = ๐’ช ๐‘‡โˆ’ ฮค 1 4 ๐œ‡๐‘˜ weakly converges to ๐œ‡ss (๐‘ข๐‘˜ ) in the long run why this rate? Wasserstein distance coupling of distribution dynamics & algorithm rate of norm of gradients w.r.t. rand. in ๐‘ข๐‘˜ ๐‘‰๐‘˜ โ‰ค ๐ฟ ๐‘“ ๐‘ ๐‘‰๐‘˜โˆ’1 + ๐ฟ ๐‘“ ๐‘๐ฟโ„Ž ๐‘ข โ€–๐‘ข๐‘˜ โˆ’ ๐‘ข๐‘˜โˆ’1 โ€–
  18. High-probability guarantees 18 For any window of length ๐‘‡, with

    ๐œ‚ = ๐’ช(1/ ๐‘‡), for fixed ๐œ โˆˆ (0,1) holds with probability at least 1 โˆ’ ๐œ same dependence on ๐‘‡ & benign scaling with ๐œ , e.g. ๐œ = 10โˆ’4 โŸน lg 1 ๐œ = 4 second moment of gradient 1 ๐‘‡ เท ๐‘˜=0 ๐‘‡โˆ’1 โ€–โˆ‡เทฉ ฮฆ(๐‘ข๐‘˜ )โ€–2 = ๐’ช 1 ๐‘‡ 1 + lg 1 ๐œ 1 ๐‘‡ เท ๐‘˜=0 ๐‘‡โˆ’1 ๐‘Š1 (๐œ‡๐‘˜ , ๐œ‡ss (๐‘ข๐‘˜ )) = ๐’ช 1 ๐‘‡1/4 1 + lg 1 ๐œ Wasserstein distance failure probability
  19. ฮฆ๐‘ (with ๐œ‡๐‘˜ ๐‘) generalize to ฮฆ (with ๐œ‡๐‘˜ )?

    how will solutions found based on Generalization from a fixed sample distribution 19 true distribution empirical distribution Why? seeding trials restricted sampling (privacy, resource constraints) decision decision Question V.S. ๐œ‡๐‘˜ ๐œ‡๐‘˜ ๐‘ = 1 ๐‘ เท ๐‘–=1 ๐‘ ๐›ฟ ๐‘๐‘˜ ๐‘– ๐‘0 1, โ€ฆ ๐‘0 ๐‘: fixed sample distribution
  20. Generalization with high probability 20 Can only access samples ๐‘

    & optimize the empirical objective เทฉ ฮฆ๐‘ = ฯƒ ๐‘–=1 ๐‘ ฮฆ(๐‘ข, ๐‘k ๐‘– )/๐‘ with a high probability solutions generalize well to the true objective เทฉ ฮฆ For any window of length ๐‘‡, any fixed ๐œ โˆˆ (0,1) when ๐‘ โ‰ฅ 1 ๐‘2 lg 2๐‘1 ๐œ & ๐œ‚ = ๐’ช(1/ ๐‘‡) holds with probability at least 1 โˆ’ ๐œ size of exogeneous disturbances from ๐œ‡๐‘˜ ๐‘ 1 ๐‘‡ เท ๐‘˜=0 ๐‘‡โˆ’1 โ€–โˆ‡เทฉ ฮฆ(๐‘ข๐‘˜ ๐‘)โ€–2 = ๐’ช 1 ๐‘ 2 ๐‘Ÿ lg 1 ๐œ 2 ๐‘Ÿ + ๐’ช 1 ๐‘‡ 1 + lg 1 ๐œ true objective same rate as before
  21. Recall: polarized population model & affinity maximization 22 Goal: population-wide

    affinity optimization Distributional steady state for large- scale population ๐œ‡ss (๐‘ข) = โ„Ž(๐‘ข,โ‹…)# ๐œ‡0 Model ๐‘๐‘˜+1 = normalize ๐œ† โ‹… ๐‘๐‘˜ + 1 โˆ’ ๐œ† โ‹… ๐‘0 + ๐œŽ โ‹… ๐‘๐‘˜ โŠค๐‘ข ๐‘ข utility individual position ideology initial position biased assimilation ๐‘0 โˆผ ๐œ‡0 ๐‘0 sampled from initial distribution ๐œ‡0 ๐œ‡ss ๐œ‡0 s. t. ๐œ‡ss (๐‘ข) = โ„Ž(๐‘ข,โ‹…)# ๐œ‡0 max โ€–๐‘ขโ€–โ‰ค1 ๐”ผ๐‘โˆผ๐œ‡ss(๐‘ข) [๐‘โŠค๐‘ข]
  22. Optimality 23 anticipate & shape distribution dynamics significant improvement in

    optimality vanilla stochastic gradient descent without anticipation our composite algorithm with anticipation & sensitivity estimation ๐‘ขโˆ—
  23. Histograms of position-decision angles 24 ground-truth optimum (via IPOPT) vanilla

    gradient our composite algorithm higher population-wide affinities more acute angles close to histogram of the optimal solution many samples in [95ยฐ, 110ยฐ] lower affinity ๐‘ข ๐‘0 ๐‘ss
  24. Dynamics of preference distribution & recommender system 26 choice distribution

    smoothened price with softmax โˆ’๐œ–๐‘ข โ‰ˆ min(๐‘ข) Model for usersโ€™ preference ๐‘ for a set of items (e.g., books) initial distribution of user population Goal of recommender system: maximize gains & preserve diversity gain from selling items aligned with preference diversification by entropy regularization ๐‘๐‘˜+1 = ๐œ†1 โ‹… ๐‘๐‘˜ + ๐œ†2 โ‹… softmax(โˆ’๐œ–๐‘ข) + (1 โˆ’ ๐œ†1 โˆ’ ๐œ†2 ) โ‹… ๐‘0 max ๐‘ข ๐‘โŠค๐‘ข + ๐œŒ เท ๐‘–=1 ๐‘š ๐‘๐‘– log ๐‘๐‘– ๐‘ = โ„Ž(๐‘ข, ๐‘0 ) 0 โ‰ค ๐‘ข๐‘– โ‰ค ๐‘ข s. t. 1โŠค๐‘ข = ๐‘ข๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ = + steady-state choice distribution budget constraints read by user recommended to user similar books
  25. Optimality & distributional convergence 27 ((๐‘ขโˆ—)) anticipate & shape distribution

    dynamics significant improvement in optimality vanilla stochastic gradient descent without anticipation our composite algorithm with anticipation & sensitivity estimation derivative-free gradient in feedback ร  la RL
  26. Conclusions distribution shift is amenable to control go with the

    flow ! online stochastic algorithm distribution perspective convergence optimality: expectation + high-prob. generalization for fixed samples 28 adapt to & shape distributions Future work: stochastic constraints, game setup, & model-free
  27. Z. He 29 S. Bolognani M. Muehlebach arXiv GitHub random

    variable ๐‘ decision-making ๐œ‡๐‘˜ decision dependence dynamics ๐œ‡๐‘˜+1 min ๐‘ข ๐”ผ๐‘โˆผ๐œ‡ [ฮฆ(๐‘ข; ๐‘)]