model selection To: Approximate frequent subgraph mining Automatic differentiation From: Backpropagation for deep learning To: Learning agent-based models 4

for frequent subgraph mining Sampling-based algorithm Approximation guarantees on frequency No false negatives, perfect recall 100x faster than exact algorithm 6

Informally: measure of model capacity a set of elements called points a family of subsets of called ranges, is a range space 𝒟 ℛ 𝒟 ℛ ⊆ 2 𝒟 ( 𝒟 , ℛ) HARD! 8

Informally: measure of model capacity a set of elements called points a family of subsets of called ranges, is a range space 𝒟 ℛ 𝒟 ℛ ⊆ 2 𝒟 ( 𝒟 , ℛ) The projection of on is the set of subsets ℛ D ⊆ 𝒟 ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ} HARD! 8

Informally: measure of model capacity a set of elements called points a family of subsets of called ranges, is a range space 𝒟 ℛ 𝒟 ℛ ⊆ 2 𝒟 ( 𝒟 , ℛ) The projection of on is the set of subsets ℛ D ⊆ 𝒟 ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ} is shattered by if its projection contains all the subsets of : D ℛ D ℛ ∩ D = 2 |D| HARD! 8

Informally: measure of model capacity a set of elements called points a family of subsets of called ranges, is a range space 𝒟 ℛ 𝒟 ℛ ⊆ 2 𝒟 ( 𝒟 , ℛ) The projection of on is the set of subsets ℛ D ⊆ 𝒟 ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ} is shattered by if its projection contains all the subsets of : D ℛ D ℛ ∩ D = 2 |D| The VC dimension of is the largest cardinality of a set that is shattered by d ( 𝒟 , ℛ) ℛ HARD! 8

Let be the set of discrete intervals in ℛ = {[a, b] ∩ ℤ : a ≤ b} 𝒟 Shattering set of two elements of is easy 𝒟 Impossible to shatter set of three elements {c, d, e} c < d < e 9

Let be the set of discrete intervals in ℛ = {[a, b] ∩ ℤ : a ≤ b} 𝒟 Shattering set of two elements of is easy 𝒟 Impossible to shatter set of three elements {c, d, e} c < d < e No range s.t. R ∈ ℛ R ∩ {c, d, e} = {c, e} 9

Let be the set of discrete intervals in ℛ = {[a, b] ∩ ℤ : a ≤ b} 𝒟 Shattering set of two elements of is easy 𝒟 Impossible to shatter set of three elements {c, d, e} c < d < e No range s.t. R ∈ ℛ R ∩ {c, d, e} = {c, e} VC dimension of this = ( 𝒟 , ℛ) 2 9

an approximation can we get from a sample? "When analyzing a random sample of size , with probability , the results are within an factor of the true results" N 1 − δ ε 11

an approximation can we get from a sample? "When analyzing a random sample of size , with probability , the results are within an factor of the true results" N 1 − δ ε Trade-off among sample size, accuracy, and complexity of the task 11

subset s.t. ε ( 𝒟 , ℛ) ε ∈ (0,1) A ⊆ 𝒟 |R ∩ 𝒟 | | 𝒟 | − |R ∩ A| |A| ≤ ε, for every R ∈ ℛ a range space with VC-dimension ( 𝒟 , ℛ) d Random sample of size N = 𝒪 ( 1 ε2 (d + log 1 δ )) 12

subset s.t. ε ( 𝒟 , ℛ) ε ∈ (0,1) A ⊆ 𝒟 |R ∩ 𝒟 | | 𝒟 | − |R ∩ A| |A| ≤ ε, for every R ∈ ℛ a range space with VC-dimension ( 𝒟 , ℛ) d Random sample of size N = 𝒪 ( 1 ε2 (d + log 1 δ )) Is -sample for with probability ε ( 𝒟 , ℛ) 1 − δ 12

pattern % = (+%, ⇢% ) in P and a vertex E 2 +% , the orbit ⌫% (E) of +% that is mapped to E by any automorphism of %, i.e., ⌫% (E) ⌘ {D 2 +% : 9` 2 Aut(%) s.t. `(D) = E} . The orbits of % form a partitioning of+% , for each D 2 ⌫% (E), it holds ⌫% (D) = in ⌫% (E) have the same label. In Fig. 1 we show examples of two patterns w v3 v1 v2 v3 v1 v2 O3 O2 O1 O2 O1 Fig. 1. Examples of paerns and orbits. Colors represent vertex labels, while circle paern on the le, v1 and v2 belong to the same orbit $1. On the right, each vertex Patterns and orbits Pattern: connected labeled graph Pattern equality: isomorphism Automorphism: isomorphism to itself Orbit: subset of pattern mapped to each other by automorphisms V2 V1 V3 V3 V2 V1 HARD! 15

Use range space Ri = {ZV (q) : q is an orbit of P with fV (P) ≥ τ} (V, Ri ) acceptable failure probability uniform sample of of size upper bound to the VC dimension δ ∈ (0,1) S V s d 20

Use range space Ri = {ZV (q) : q is an orbit of P with fV (P) ≥ τ} (V, Ri ) acceptable failure probability uniform sample of of size upper bound to the VC dimension δ ∈ (0,1) S V s d With high probability is an -sample for for S ε (V, Ri ) ε = d + log 1 δ 2s 20

∩ S| |S| ≤ εi Given that we can bound the error on every orbit, we can bound the error on its minimum fV (Pi ) − fS (Pi ) ≤ εi ⟹ fS (Pi ) ≥ fV (Pi ) − εi ≥ τ − εi 21

∩ S| |S| ≤ εi Given that we can bound the error on every orbit, we can bound the error on its minimum fV (Pi ) − fS (Pi ) ≤ εi ⟹ fS (Pi ) ≥ fV (Pi ) − εi ≥ τ − εi Lower bound on the frequency of a frequent pattern in the sample 21

patterns with vertices 2) Use them to compute an upper bound to the VC dimension of 3) Compute such that is an -sample for 4) Prune patterns that cannot be frequent with lower bound 5) Extend unpruned patterns to get candidate patterns with vertices ZS (q) i (V, Ri ) εi S εi (V, Ri ) fS (Pi ) ≥ τ − εi i + 1 23

ø 102 103 104 105 Running Time (s) Æ=1 Æ=0.8 exact Results First sampling-based algorithm Approximation guarantees on computed frequency No false negatives 24

ø 102 103 104 105 Running Time (s) Æ=1 Æ=0.8 exact Results First sampling-based algorithm Approximation guarantees on computed frequency No false negatives 24

a computer program Chain rule to break complex expressions Originally created for neural networks and deep learning (backpropagation) Different from numerical and symbolic differentiation ∂f(g(x)) ∂x = ∂f ∂g ∂g ∂x 26

) − f(x) h Slow (need to evaluate each dimension) and errors due to rounding Symbolic: Input=computation graph, Output=symbolic derivative Example: Mathematica 27

) − f(x) h Slow (need to evaluate each dimension) and errors due to rounding Symbolic: Input=computation graph, Output=symbolic derivative Example: Mathematica Slow (search and apply rules) and large intermediate state 27

specialized for ML) Learning protein structure (e.g., AlphaFold) Many-body Schrodinger equation (e.g., FermiNet) Stellarator coil design Di↵erentiable ray tracing Model uncertainty & sensitivity Optimization of ﬂuid simulations Example applications Neural Networks Optimization Ray tracing Fluid simulations Many more... 37

specialized for ML) Learning protein structure (e.g., AlphaFold) Many-body Schrodinger equation (e.g., FermiNet) Stellarator coil design Di↵erentiable ray tracing Model uncertainty & sensitivity Optimization of ﬂuid simulations Example applications Neural Networks Optimization Ray tracing Fluid simulations Many more... 37

Mechanistic and causal model of behavior Encodes sociological assumptions Agents interact according to prede fi ned rules Agents are simulated to draw conclusions 38

use it as forecasting tool Calibration of parameters: run simulations with different parameters until model is able to reproduce summary statistics of data Manual, expensive, and error-prone process 40

n+ = 0.4 n = 0.6 0 2 n+ = 1.2 n = 1.6 0 2 n+ = 0.2 n = 1.6 Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time Opinion Trajectories Parameter values encode different assumptions and determine signi fi cantly different latent trajectories 45

v T t Figure 2: Translation of everage recent advances in probabilistic programming to express our models. These frameworks combine erative programming languages with primitives that stic constructs, such as sampling from a distribution. de a naturally rich environment for transforming PGABM counterparts. Once a model is written in diﬀerent algorithms can be used to solve the variable m. wn a proof-of-concept of how a traditional opinion based on bounded conﬁdence [16] can be translated orm [46]. Figure 2 shows the plate notation for such d from our work [46]), where we represent the latent users at time t with xt (x0 is the initial condition), observed interaction from the data. Similarly to Learning problem Given observable interactions fi nd: opinions for nodes in time and sign of each edge with maximum likelihood Use EM and gradient descent via automatic differentiation G = (V, E) xt V × {0,…, T} → [−1,1] s E → {−, +} 48

0 2 n+ = 0.4 n = 0.6 0 2 n+ = 1.2 n = 1.6 0 2 n+ = 0.2 n = 1.6 Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time. 0 2 n+ = 0.6 n = 1.2 0 2 n+ = 0.4 n = 0.6 0 2 n+ = 1.2 n = 1.6 0 2 n+ = 0.2 n = 1.6 Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time. 50 0 2 n+ = 0.6 n = 1.2 0 2 n+ = 0.4 n = 0.6 0 2 n+ = 1.2 n = 1.6 0 2 n+ = 0.2 n = 1.6 Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time.

interesting building blocks VC dimension for approximation algorithms Automatic differentiation for agent- based models Repurpose it for your own goals Be curious, be bold: hack and invent! 53

Approximate Mining of Frequent Subgraph Patterns through Sampling” KDD 2021 + ACM TIST 2023 C. Monti, G. De Francisci Morales, F. Bonchi “Learning Opinion Dynamics From Social Traces” KDD 2020 C. Monti, M. Pangallo, G. De Francisci Morales, F. Bonchi “On Learning Agent-Based Models from Data” SciRep 2022 (accepted) + arXiv:2205.05052 54 [email protected] https://gdfm.me @gdfm7