Learning Agent-Based Models from Data

Learning Agent-Based Models from Data Gianmarco De Francisci Morales Principal
Researcher • CENTAI Team Lead • Social Algorithmics Team   gdfm@acm.org 1 SALT

Learning Agent-Based Models from Data Models Gianmarco De Francisci Morales
Principal Researcher • CENTAI Team Lead • Social Algorithmics Team   gdfm@acm.org 1 SALT

Agent-based model 3

Agent-based model Evolution over time of system of autonomous agents
Agents interact according to prede fi ned rules Encode sociological assumptions 3

Agent-based model Evolution over time of system of autonomous agents
Agents interact according to prede fi ned rules Encode sociological assumptions System is simulated to draw conclusions 3

Example: Schelling's segregation 2 types of agents: R and B
Satisfaction: number of neighbors of same color Homophily parameter If τ Si < τ → relocate 4

Are ABMs scienti fi c models? 5

Are ABMs scienti fi c models? Mechanistic models Explainable and
causal by construction Counterfactual level of ladder of causality 5 𝔼 (Y ∣ X) 𝔼 (Y ∣ do(X)) 𝔼 (YX′ ∣ X, YX )

Are ABMs scienti fi c models? Mechanistic models Explainable and
causal by construction Counterfactual level of ladder of causality Data not a fi rst-class citizen No sound parameter- fi tting procedure 5

Good tests kill fl awed theories — Karl Popper Falsi
fi ability of ABMs 6

ABMs and Data 7

ABMs and Data ABM born as "theory development tool" Simulations
generate implications of encoded assumptions 7

generate implications of encoded assumptions Now people use it as forecasting tool (epidemiology, economics, etc.) 7

generate implications of encoded assumptions Now people use it as forecasting tool (epidemiology, economics, etc.) Calibration to set parameters from data 7

Calibration 8

Calibration Run simulations with different parameters until model reproduces  
summary statistics of data 8

summary statistics of data No parameter signi fi cance or model selection 8

summary statistics of data No parameter signi fi cance or model selection Arbitrary choice of summary statistics and distance measure 8

summary statistics of data No parameter signi fi cance or model selection Arbitrary choice of summary statistics and distance measure Manual, expensive, and error-prone process 8

Can we do better? 9

Can we do better? Yes! 9

Can we do better? Yes! Rewrite ABM as Probabilistic Generative
Model   Xt ∼ Pt (Xt ∣ Θ, Xτ<t ) 9

Model   Xt ∼ Pt (Xt ∣ Θ, Xτ<t ) Write likelihood of parameters given data   ℒ(Θ ∣ X) = PΘ (X ∣ Θ) 9

Model   Xt ∼ Pt (Xt ∣ Θ, Xτ<t ) Write likelihood of parameters given data   ℒ(Θ ∣ X) = PΘ (X ∣ Θ) Maximize via Auto Differentiation ̂ Θ = arg max Θ ℒ(Θ ∣ X) 9

Historical Aside Maximum Likelihood Estimation invented by Fisher Dates back
to Daniel Bernoulli and Lagrange in the eighteenth century Fisher introduced the method as alternative to method of moments Which he criticizes for its arbitrariness in the choice of moment equations 10

Autodiff Set of techniques to evaluate the partial derivative of
a computer program Chain rule to break complex expressions Originally created for neural networks and deep learning (backpropagation) Different from numerical and symbolic differentiation ∂f(g(x)) ∂x = ∂f ∂g ∂g ∂x 11

Example Automatic Differentiation (autodiff) • Create computation graph for gradient
computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 , = 1 1 + *.(012034320545) 1/% 12

computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 1/% − 1 %& - = 1 1 + */(123145431656) - % = 1/% à 89 85 = −1/%& 13

computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 1/% − 1 %& - = 1 1 + */(123145431656) ∗ 1 - % = % + 1 à 89 85 = 1 14

computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 1/% − 1 %& - = 1 1 + */(123145431656) ∗ 1 ∗ - % = *5 à 89 85 = *5 15

computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 1/% − 1 %& - = 1 1 + */(123145431656) ∗ 1 ∗ ∗ −1 ∗ 89 814 - %, " = %" à 8; 81 = % 16

computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 1/% − 1 %& - = 1 1 + */(123145431656) ∗ 1 ∗ ∗ −1 ∗ 89 814 ∗ 89 816 17

Problem Solution → 18

Problem Solution → No parameter signi fi cance or model
selection Probabilistic modeling   18 →

selection Arbitrary choice of summary statistics and distance measure Probabilistic modeling   Data likelihood   18 → →

selection Arbitrary choice of summary statistics and distance measure Manual, expensive, and   error-prone process Probabilistic modeling   Data likelihood   Automatic differentiation 18 → → →

Likelihood vs Simulation 19

Opinion dynamics How people's belief evolve Polarization, Radicalization, Echo Chambers
Data from Social Media 20

Likelihood-Based Methods Improve Parameter Estimation in Opinion D XC X0
XC+1 B 4 n ) (a) BCM-F X: opinions s: interaction outcome e: interacting agents : bounded con fi dence interval ϵ Generative Model 21

Macro-Parameter Inference 22

Macro-Parameter Inference 22 RE=89%   (90th pctile) RE=22%   (90th
pctile)

More Accurate and Faster 23

Micro-Parameter Inference 24

Case Studies 1. Housing Market (HM) 2. Opinion Dynamics (OD)
25

A B C n l 26

0 1000 x=0 x=1 x=2 x=3 k=0 x=4 0 1000
k=1 0 1000 k=2 Number of agents Mt 0 100 k=0 0 100 k=1 100 k er of buyers DB t Learned trace Ground truth Latent Observable 0 1000 x=0 x=1 x=2 x=3 k=0 x=4 0 1000 k=1 0 1000 k=2 Number of agents Mt 0 100 k=0 0 100 k=1 100 k= ber of buyers DB t Learned trace Ground truth 0 1000 x=0 x=1 x=2 x=3 k=0 x=4 1000 k=1 f agents Mt Learned trace Ground truth Micro-State Inference

Forecasting 28

2 0 2 n+ = 0.4 n = 0.6 0
2 n+ = 1.2 n = 1.6 nthetic data traces generated in each scenario. Plots represent the opini Opinion Trajectories Parameter values encode different assumptions and   determine signi fi cantly different latent trajectories 29

Reconstructing synthetic data Estimated x0 True x0 Estimated xt True
xt 30

Recovering parameters 31 Figure 4: Examples of synthetic data traces
generated in each s 0 2 n+ = 0.6 n = 1.2 0 2 n+ = 0.4 n = 0.6 0 2 n+ = 1.2 n = 1.6 0 2 n+ = 0.2 n = 1.6 Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time.

Real data: 32

Real data: Number of upvotes on comments 32

Real data: Number of upvotes on comments Estimate position of
users and subreddits in opinion space 32

Real data: Number of upvotes on comments Estimate position of
users and subreddits in opinion space Larger estimated distance of user from subreddit fewer upvotes of user on that subreddit → 32

Variational Approach 33

Likelihoods are hard 34

Likelihoods are hard Writing the correct complete data likelihood can
be challenging Easy to make mistakes Requires deep understanding of the data generating process 34

Likelihoods are hard Writing the correct complete data likelihood can
be challenging Easy to make mistakes Requires deep understanding of the data generating process Is there a way to avoid it? Variational approximation! 34

Variational Inference 35

Variational Inference Bayesian technique to approximate intractable probability integrals Alternative
to Monte Carlo sampling (e.g., MCMC, Gibbs sampling) 35

to Monte Carlo sampling (e.g., MCMC, Gibbs sampling) approximation of the posterior = variational distribution, tractable parametric family Variational parameters optimized by minimizing the KL divergence between and P(Θ ∣ X) ≈ Qϕ (Θ) Qϕ ϕ P Q 35

to Monte Carlo sampling (e.g., MCMC, Gibbs sampling) approximation of the posterior = variational distribution, tractable parametric family Variational parameters optimized by minimizing the KL divergence between and P(Θ ∣ X) ≈ Qϕ (Θ) Qϕ ϕ P Q Transforms inference into an optimization problem 35

No Need to Write Likelihood 36 P( ∣ ) Original
ABM Probabilistic Generative ABM Variational Inference Data Approximate posterior macro- and micro-parameters ˜ P(ε ∣ ) ˜ P(r 1 , …, r N ∣ ) t t+1 t+1 t

Accurate Macro-Parameter Inference 37 SVI = Stochastic Variational Inference

Accurate Micro-Parameter Inference 38

Fast 39

Model Thinking 40

Model Thinking Always think about the Data-Generating Process 40

Model Thinking Always think about the Data-Generating Process Model- fi
rst thinking as justi fi cation for scienti fi c claims 40

rst thinking as justi fi cation for scienti fi c claims Causal (possibly mechanistic) models (e.g., ABMs) 40

rst thinking as justi fi cation for scienti fi c claims Causal (possibly mechanistic) models (e.g., ABMs) Use scienti fi cally signi fi cant models 40

Conclusions 41

Conclusions Fit ABMs as statistical models Rewrite as probabilistic models
for the data-generating process 41

for the data-generating process Learn latent variables of models Forecasting and predictions Model selection 41

for the data-generating process Learn latent variables of models Forecasting and predictions Model selection Use data to fi gure out which models work (Ptolemy vs Kepler) Bring ABM in line with statistical models (and scienti fi c ones) 41

C. Monti, G. De Francisci Morales, F. Bonchi   “Learning
Opinion Dynamics From Social Traces”   KDD 2020 C. Monti, M. Pangallo, G. De Francisci Morales, F. Bonchi   “On Learning Agent-Based Models from Data”   Scienti fi c Reports 2023 J. Lenti, C. Monti, G. De Francisci Morales   “Likelihood-Based Methods Improve Parameter Estimation in Opinion Dynamics Models”   WSDM 2024 J. Lenti, F. Silvestri, G. De Francisci Morales   “Variational Inference of Parameters in Opinion Dynamics Models”   arXiv:2403.05358 2024 42 gdfm@acm.org https://gdfm.me @gdfm7

Learning Agent-Based Models from Data

Learning Agent-Based Models from Data

More Decks by Gianmarco De Francisci Morales

Other Decks in Research

Featured

Transcript