# Repurpose, Reuse, Recycle the building blocks of Machine Learning

Keynote at the Machine Learning Day @KTH, 17/5/23.

May 19, 2023

## Transcript

1. ### Repurpose, Reuse, Recycle the building blocks of Machine Learning Gianmarco

De Francisci Morales   Principal Researcher   [email protected] 1

7. ### Today's Plan Vapnik-Chervonenkis (VC) dimension From: Statistical learning theory and

model selection To: Approximate frequent subgraph mining 4
8. ### Today's Plan Vapnik-Chervonenkis (VC) dimension From: Statistical learning theory and

model selection To: Approximate frequent subgraph mining Automatic differentiation From: Backpropagation for deep learning To: Learning agent-based models 4

10. ### 5 reasons to like the VC dimension First approximation algorithm

for frequent subgraph mining Sampling-based algorithm Approximation guarantees on frequency No false negatives, perfect recall 100x faster than exact algorithm 6
11. ### Linear model in 2D Can shatter   3 points Cannot

shatter   4 points 7

13. ### VC dimension de fi nition Concept from statistical learning theory

Informally: measure of model capacity HARD! 8
14. ### VC dimension de fi nition Concept from statistical learning theory

Informally: measure of model capacity a set of elements called points   a family of subsets of called ranges,   is a range space 𝒟 ℛ 𝒟 ℛ ⊆ 2 𝒟 ( 𝒟 , ℛ) HARD! 8
15. ### VC dimension de fi nition Concept from statistical learning theory

Informally: measure of model capacity a set of elements called points   a family of subsets of called ranges,   is a range space 𝒟 ℛ 𝒟 ℛ ⊆ 2 𝒟 ( 𝒟 , ℛ) The projection of on is the set of subsets ℛ D ⊆ 𝒟 ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ} HARD! 8
16. ### VC dimension de fi nition Concept from statistical learning theory

Informally: measure of model capacity a set of elements called points   a family of subsets of called ranges,   is a range space 𝒟 ℛ 𝒟 ℛ ⊆ 2 𝒟 ( 𝒟 , ℛ) The projection of on is the set of subsets ℛ D ⊆ 𝒟 ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ} is shattered by if its projection contains all the subsets of : D ℛ D ℛ ∩ D = 2 |D| HARD! 8
17. ### VC dimension de fi nition Concept from statistical learning theory

Informally: measure of model capacity a set of elements called points   a family of subsets of called ranges,   is a range space 𝒟 ℛ 𝒟 ℛ ⊆ 2 𝒟 ( 𝒟 , ℛ) The projection of on is the set of subsets ℛ D ⊆ 𝒟 ℛ ∩ D := {h ∩ D ∣ h ∈ ℛ} is shattered by if its projection contains all the subsets of : D ℛ D ℛ ∩ D = 2 |D| The VC dimension of is the largest cardinality of a set that is shattered by d ( 𝒟 , ℛ) ℛ HARD! 8

20. ### Example: Intervals Let be the elements of 𝒟 ℤ Let

be the set of discrete intervals in ℛ = {[a, b] ∩ ℤ : a ≤ b} 𝒟 9
21. ### o Example: Intervals Let be the elements of 𝒟 ℤ

Let   be the set of discrete intervals in ℛ = {[a, b] ∩ ℤ : a ≤ b} 𝒟 Shattering set of two elements of is easy 𝒟 9
22. ### o Example: Intervals Let be the elements of 𝒟 ℤ

Let   be the set of discrete intervals in ℛ = {[a, b] ∩ ℤ : a ≤ b} 𝒟 Shattering set of two elements of is easy 𝒟 Impossible to shatter set of three elements {c, d, e} c < d < e 9
23. ### o Example: Intervals Let be the elements of 𝒟 ℤ

Let   be the set of discrete intervals in ℛ = {[a, b] ∩ ℤ : a ≤ b} 𝒟 Shattering set of two elements of is easy 𝒟 Impossible to shatter set of three elements {c, d, e} c < d < e No range s.t. R ∈ ℛ R ∩ {c, d, e} = {c, e} 9
24. ### o Example: Intervals Let be the elements of 𝒟 ℤ

Let   be the set of discrete intervals in ℛ = {[a, b] ∩ ℤ : a ≤ b} 𝒟 Shattering set of two elements of is easy 𝒟 Impossible to shatter set of three elements {c, d, e} c < d < e No range s.t. R ∈ ℛ R ∩ {c, d, e} = {c, e} VC dimension of this = ( 𝒟 , ℛ) 2 9
25. ### Pr test error ≤ training error + 1 N d

( log ( 2N d ) + 1 ) − log ( δ 4) = 1 − δ VC dimension in ML 10
26. ### Pr test error ≤ training error + 1 N d

( log ( 2N d ) + 1 ) − log ( δ 4) = 1 − δ VC dimension in ML 10

29. ### VC dimension for data analysis Dataset = Sample How good

an approximation can we get from a sample? 11
30. ### VC dimension for data analysis Dataset = Sample How good

an approximation can we get from a sample? "When analyzing a random sample of size , with probability , the results are within an factor of the true results" N 1 − δ ε 11
31. ### VC dimension for data analysis Dataset = Sample How good

an approximation can we get from a sample? "When analyzing a random sample of size , with probability , the results are within an factor of the true results" N 1 − δ ε Trade-off among sample size, accuracy, and complexity of the task 11

33. ### -sample and VC dimension ε -sample for : for a

subset s.t.     ε ( 𝒟 , ℛ) ε ∈ (0,1) A ⊆ 𝒟 |R ∩ 𝒟 | | 𝒟 | − |R ∩ A| |A| ≤ ε, for every R ∈ ℛ 12
34. ### -sample and VC dimension ε -sample for : for a

subset s.t.     ε ( 𝒟 , ℛ) ε ∈ (0,1) A ⊆ 𝒟 |R ∩ 𝒟 | | 𝒟 | − |R ∩ A| |A| ≤ ε, for every R ∈ ℛ a range space with VC-dimension ( 𝒟 , ℛ) d 12
35. ### -sample and VC dimension ε -sample for : for a

subset s.t.     ε ( 𝒟 , ℛ) ε ∈ (0,1) A ⊆ 𝒟 |R ∩ 𝒟 | | 𝒟 | − |R ∩ A| |A| ≤ ε, for every R ∈ ℛ a range space with VC-dimension ( 𝒟 , ℛ) d Random sample of size N = 𝒪 ( 1 ε2 (d + log 1 δ )) 12
36. ### -sample and VC dimension ε -sample for : for a

subset s.t.     ε ( 𝒟 , ℛ) ε ∈ (0,1) A ⊆ 𝒟 |R ∩ 𝒟 | | 𝒟 | − |R ∩ A| |A| ≤ ε, for every R ∈ ℛ a range space with VC-dimension ( 𝒟 , ℛ) d Random sample of size N = 𝒪 ( 1 ε2 (d + log 1 δ )) Is -sample for with probability ε ( 𝒟 , ℛ) 1 − δ 12
37. ### Example applications Betweenness Centrality Clustering Coef fi cient Set Cover

Frequent Itemset Mining 13

41. ### Patterns and orbits Pattern: connected labeled graph HARD! 15 1

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
42. ### Patterns and orbits Pattern: connected labeled graph Pattern equality: isomorphism

HARD! 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
43. ### Patterns and orbits Pattern: connected labeled graph Pattern equality: isomorphism

Automorphism: isomorphism to itself HARD! 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
44. ### automorphisms and their set is denoted as Aut(⌧). Given a

pattern % = (+%, ⇢% ) in P and a vertex E 2 +% , the orbit ⌫% (E) of +% that is mapped to E by any automorphism of %, i.e., ⌫% (E) ⌘ {D 2 +% : 9` 2 Aut(%) s.t. `(D) = E} . The orbits of % form a partitioning of+% , for each D 2 ⌫% (E), it holds ⌫% (D) = in ⌫% (E) have the same label. In Fig. 1 we show examples of two patterns w v3 v1 v2 v3 v1 v2 O3 O2 O1 O2 O1 Fig. 1. Examples of pa￿erns and orbits. Colors represent vertex labels, while circle pa￿ern on the le￿, v1 and v2 belong to the same orbit \$1. On the right, each vertex Patterns and orbits Pattern: connected labeled graph Pattern equality: isomorphism Automorphism: isomorphism to itself Orbit: subset of pattern mapped to each other by automorphisms V2 V1 V3 V3 V2 V1 HARD! 15

50. ### Frequency of a pattern Graph Pattern Frequency 1 4 Not

anti-monotone! 16
51. ### Minimum Node-based Image (MNI) V2 V3 V4 V5 V1 Graph

Pattern Frequency Image 17
52. ### Minimum Node-based Image (MNI) V2 V3 V4 V5 V1 Graph

Pattern Frequency Image 17
53. ### Minimum Node-based Image (MNI) V2 V3 V4 V5 V1 Graph

Pattern Frequency Image {V1} 17
54. ### Minimum Node-based Image (MNI) V2 V3 V4 V5 V1 Graph

Pattern Frequency 1 Image {V1} 17
55. ### Minimum Node-based Image (MNI) V2 V3 V4 V5 V1 Graph

Pattern Frequency 1 Image {V1} 17
56. ### Minimum Node-based Image (MNI) V2 V3 V4 V5 V1 Graph

Pattern Frequency 1 Image {V1} {V2,V3,V4,V5} 17
57. ### Minimum Node-based Image (MNI) V2 V3 V4 V5 V1 Graph

Pattern Frequency 1 4 Image {V1} {V2,V3,V4,V5} 17
58. ### Minimum Node-based Image (MNI) V2 V3 V4 V5 V1 Graph

Pattern Frequency 1 4 Image {V1} {V2,V3,V4,V5} min(1,4)=1 17
59. ### Minimum Node-based Image (MNI) V2 V3 V4 V5 V1 Graph

Pattern Frequency 1 4 Image {V1} {V2,V3,V4,V5} Anti-monotone! min(1,4)=1 17
60. ### Relative MNI frequency = image set of orbit of pattern

on Relative MNI frequency of pattern in graph     ZV (q) q P V P G = (V, E) fV (P) = min q∈P { |ZV (q)| |V| } 18

τ S 19
63. ### Approx. Frequent Subgraph Mining Given threshold , sample of vertices

τ S With probability at least 1 − δ 19
64. ### Approx. Frequent Subgraph Mining Given threshold , sample of vertices

τ S With probability at least 1 − δ For every pattern with P fV (P) ≥ τ 19
65. ### Approx. Frequent Subgraph Mining Given threshold , sample of vertices

τ S With probability at least 1 − δ For every pattern with P fV (P) ≥ τ Find s.t. (P, εp ) fV (P) − fS (P) = |ZV (q)| |V| − |ZS (q)| |S| ≤ εP 19
66. ### Approx. Frequent Subgraph Mining Given threshold , sample of vertices

τ S With probability at least 1 − δ For every pattern with P fV (P) ≥ τ Find s.t. (P, εp ) fV (P) − fS (P) = |ZV (q)| |V| − |ZS (q)| |S| ≤ εP |R ∩ 𝒟 | | 𝒟 | − |R ∩ A| |A| ≤ ε -sample ε 19
67. ### Approx. Frequent Subgraph Mining Given threshold , sample of vertices

τ S With probability at least 1 − δ For every pattern with P fV (P) ≥ τ Find s.t. (P, εp ) fV (P) − fS (P) = |ZV (q)| |V| − |ZS (q)| |S| ≤ εP |R ∩ 𝒟 | | 𝒟 | − |R ∩ A| |A| ≤ ε -sample ε 19
68. ### Approx. Frequent Subgraph Mining Given threshold , sample of vertices

τ S With probability at least 1 − δ For every pattern with P fV (P) ≥ τ Find s.t. (P, εp ) fV (P) − fS (P) = |ZV (q)| |V| − |ZS (q)| |S| ≤ εP |R ∩ 𝒟 | | 𝒟 | − |R ∩ A| |A| ≤ ε -sample ε 19
69. ### Approx. Frequent Subgraph Mining Given threshold , sample of vertices

τ S With probability at least 1 − δ For every pattern with P fV (P) ≥ τ Find s.t. (P, εp ) fV (P) − fS (P) = |ZV (q)| |V| − |ZS (q)| |S| ≤ εP |R ∩ 𝒟 | | 𝒟 | − |R ∩ A| |A| ≤ ε -sample ε 19

71. ### Empirical VC dimension for FSG orbits of frequent patterns

Use range space Ri = {ZV (q) : q is an orbit of P with fV (P) ≥ τ} (V, Ri ) 20
72. ### Empirical VC dimension for FSG orbits of frequent patterns

Use range space Ri = {ZV (q) : q is an orbit of P with fV (P) ≥ τ} (V, Ri ) acceptable failure probability   uniform sample of of size   upper bound to the VC dimension δ ∈ (0,1) S V s d 20
73. ### Empirical VC dimension for FSG orbits of frequent patterns

Use range space Ri = {ZV (q) : q is an orbit of P with fV (P) ≥ τ} (V, Ri ) acceptable failure probability   uniform sample of of size   upper bound to the VC dimension δ ∈ (0,1) S V s d With high probability is an -sample for for S ε (V, Ri ) ε = d + log 1 δ 2s 20

75. ### Pruning -sample guarantee: ε |Ri ∩ V| |V| − |Ri

∩ S| |S| ≤ εi 21
76. ### Pruning -sample guarantee: ε |Ri ∩ V| |V| − |Ri

∩ S| |S| ≤ εi Given that we can bound the error on every orbit,   we can bound the error on its minimum 21
77. ### Pruning -sample guarantee: ε |Ri ∩ V| |V| − |Ri

∩ S| |S| ≤ εi Given that we can bound the error on every orbit,   we can bound the error on its minimum fV (Pi ) − fS (Pi ) ≤ εi ⟹ fS (Pi ) ≥ fV (Pi ) − εi ≥ τ − εi 21
78. ### Pruning -sample guarantee: ε |Ri ∩ V| |V| − |Ri

∩ S| |S| ≤ εi Given that we can bound the error on every orbit,   we can bound the error on its minimum fV (Pi ) − fS (Pi ) ≤ εi ⟹ fS (Pi ) ≥ fV (Pi ) − εi ≥ τ − εi Lower bound on the frequency of a frequent pattern in the sample 21

85. ### MaNIACS 1) Find image sets of the orbits of unpruned

patterns with vertices 2) Use them to compute an upper bound to the VC dimension of 3) Compute such that is an -sample for 4) Prune patterns that cannot be frequent with lower bound 5) Extend unpruned patterns to get candidate patterns with vertices ZS (q) i (V, Ri ) εi S εi (V, Ri ) fS (Pi ) ≥ τ − εi i + 1 23
86. ### 0.18 0.20 0.22 0.24 0.26 0.28 0.30 Min Frequency Threshold

ø 102 103 104 105 Running Time (s) Æ=1 Æ=0.8 exact Results First sampling-based algorithm Approximation guarantees on computed frequency No false negatives 24 1K 1.4K 1.7K 2K 2.3K 2.6K 2.9K Sample Size 0.01 0.02 0.03 0.04 0.05 0.06 0.07 MaxAE Bound MaxAE "2 "3 "4 "5
87. ### 0.18 0.20 0.22 0.24 0.26 0.28 0.30 Min Frequency Threshold

ø 102 103 104 105 Running Time (s) Æ=1 Æ=0.8 exact Results First sampling-based algorithm Approximation guarantees on computed frequency No false negatives 24
88. ### 0.18 0.20 0.22 0.24 0.26 0.28 0.30 Min Frequency Threshold

ø 102 103 104 105 Running Time (s) Æ=1 Æ=0.8 exact Results First sampling-based algorithm Approximation guarantees on computed frequency No false negatives 24

90. ### Autodiff Set of techniques to evaluate the partial derivative of

a computer program Chain rule to break complex expressions Originally created for neural networks and deep learning (backpropagation) Different from numerical and symbolic differentiation ∂f(g(x)) ∂x = ∂f ∂g ∂g ∂x 26

92. ### Alternatives Numerical: ∂f(x) dxi ≈ lim h→0 f(x + hei

) − f(x) h 27
93. ### Alternatives Numerical: ∂f(x) dxi ≈ lim h→0 f(x + hei

) − f(x) h Slow (need to evaluate each dimension) and errors due to rounding 27
94. ### Alternatives Numerical: ∂f(x) dxi ≈ lim h→0 f(x + hei

) − f(x) h Slow (need to evaluate each dimension) and errors due to rounding Symbolic: Input=computation graph, Output=symbolic derivative 27
95. ### Alternatives Numerical: ∂f(x) dxi ≈ lim h→0 f(x + hei

) − f(x) h Slow (need to evaluate each dimension) and errors due to rounding Symbolic: Input=computation graph, Output=symbolic derivative Example: Mathematica 27
96. ### Alternatives Numerical: ∂f(x) dxi ≈ lim h→0 f(x + hei

) − f(x) h Slow (need to evaluate each dimension) and errors due to rounding Symbolic: Input=computation graph, Output=symbolic derivative Example: Mathematica Slow (search and apply rules) and large intermediate state 27

99. ### Example Automatic Differentiation (autodiff) • Create computation graph for gradient

computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 , = 1 1 + *.(012034320545) 1/% 30
100. ### Example Automatic Differentiation (autodiff) • Create computation graph for gradient

computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 1/% − 1 %& - = 1 1 + */(123145431656) - % = 1/% à 89 85 = −1/%& 31
101. ### Example Automatic Differentiation (autodiff) • Create computation graph for gradient

computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 1/% − 1 %& - = 1 1 + */(123145431656) ∗ 1 - % = % + 1 à 89 85 = 1 32
102. ### Example Automatic Differentiation (autodiff) • Create computation graph for gradient

computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 1/% − 1 %& - = 1 1 + */(123145431656) ∗ 1 ∗ - % = *5 à 89 85 = *5 33
103. ### Example Automatic Differentiation (autodiff) • Create computation graph for gradient

computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 1/% − 1 %& - = 1 1 + */(123145431656) ∗ 1 ∗ ∗ −1 ∗ 89 814 - %, " = %" à 8; 81 = % 34
104. ### Example Automatic Differentiation (autodiff) • Create computation graph for gradient

computation ∗ "# + %# ∗ "& %& "' + ∗ −1 *%+ +1 1/% − 1 %& - = 1 1 + */(123145431656) ∗ 1 ∗ ∗ −1 ∗ 89 814 ∗ 89 816 35

106. ### A few highlights Machine Learning (Tensorﬂow, Pytorch are AD libraries

specialized for ML) Learning protein structure (e.g., AlphaFold) Many-body Schrodinger equation (e.g., FermiNet) Stellarator coil design Di↵erentiable ray tracing Model uncertainty & sensitivity Optimization of ﬂuid simulations Example   applications Neural Networks Optimization Ray tracing Fluid simulations Many more... 37
107. ### A few highlights Machine Learning (Tensorﬂow, Pytorch are AD libraries

specialized for ML) Learning protein structure (e.g., AlphaFold) Many-body Schrodinger equation (e.g., FermiNet) Stellarator coil design Di↵erentiable ray tracing Model uncertainty & sensitivity Optimization of ﬂuid simulations Example   applications Neural Networks Optimization Ray tracing Fluid simulations Many more... 37
108. ### Agent-based model Evolution over time of system of autonomous agents

Mechanistic and causal model of behavior Encodes sociological assumptions Agents interact according to prede fi ned rules Agents are simulated to draw conclusions 38
109. ### Example: Schelling's segregation 2 types of agents: R and B

Satisfaction: number of neighbors of same color Homophily parameter If τ Si < τ → relocate 39
110. ### Example: Schelling's segregation 2 types of agents: R and B

Satisfaction: number of neighbors of same color Homophily parameter If τ Si < τ → relocate 39
111. ### What about data? ABM is "theory development tool" Some people

use it as forecasting tool Calibration of parameters: run simulations with different parameters until model is able to reproduce summary statistics of data Manual, expensive, and error-prone process 40

Model 41
115. ### Can we do better? Yes! Rewrite ABM as Probabilistic Generative

Model Write likelihood of parameters given data ℒ(Θ|X) 41
116. ### Can we do better? Yes! Rewrite ABM as Probabilistic Generative

Model Write likelihood of parameters given data ℒ(Θ|X) Maximize via Auto Differentiation ̂ Θ = arg max Θ ℒ(Θ|X) 41
117. ### Opinion dynamics How people's belief evolve Polarization, Radicalization, Echo Chambers

Data from Social Media 42
118. ### Opinion dynamics How people's belief evolve Polarization, Radicalization, Echo Chambers

Data from Social Media 42
119. ### Bounded Con fi dence Model Opinion Each time agents interact

they get closer if they are closer than Positive interaction xu ∈ [−1,1] ϵ+ 43
120. ### Bounded Con fi dence Model Opinion Each time agents interact

they get closer if they are closer than Positive interaction xu ∈ [−1,1] ϵ+ 43
121. ### Repulsive behavior Can interactions back fi re? Each time agents

interact they get further away if they were further than Negative interaction ϵ− 44
122. ### Repulsive behavior Can interactions back fi re? Each time agents

interact they get further away if they were further than Negative interaction ϵ− 44
123. ### 0 2 n+ = 0.6 n = 1.2 0 2

n+ = 0.4 n = 0.6 0 2 n+ = 1.2 n = 1.6 0 2 n+ = 0.2 n = 1.6 Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time Opinion Trajectories Parameter values encode different assumptions and   determine signi fi cantly different latent trajectories 45
124. ### Rewrite as probabilistic model Replace step function with smooth version

(sigmoid)             |xu − xv | > ϵ− ⟹ S(u, v) = − 1 P((u, v) ∈ E ∣ S(u, v) = − 1) ∝ σ (|xu − xv | − ϵ−) Opinion distance Likelihood 46
125. ### Learning from data Assume we see presence of interactions But

signs are latent And opinions of users are latent Can we learn the dynamics and parameters of the system? 47
126. ### ales Part B2 ALBEDO xt x0 xt+1 ↵t s u,

v T t Figure 2: Translation of everage recent advances in probabilistic programming to express our models. These frameworks combine erative programming languages with primitives that stic constructs, such as sampling from a distribution. de a naturally rich environment for transforming PGABM counterparts. Once a model is written in diﬀerent algorithms can be used to solve the variable m. wn a proof-of-concept of how a traditional opinion based on bounded conﬁdence [16] can be translated orm [46]. Figure 2 shows the plate notation for such d from our work [46]), where we represent the latent users at time t with xt (x0 is the initial condition), observed interaction from the data. Similarly to Learning problem Given observable interactions     fi nd:   opinions for nodes in time and   sign of each edge   with maximum likelihood Use EM and gradient descent via automatic differentiation G = (V, E) xt V × {0,…, T} → [−1,1] s E → {−, +} 48

xt 49
128. ### Recovering parameters 0 2 n+ = 0.6 n = 1.2

0 2 n+ = 0.4 n = 0.6 0 2 n+ = 1.2 n = 1.6 0 2 n+ = 0.2 n = 1.6 Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time. 0 2 n+ = 0.6 n = 1.2 0 2 n+ = 0.4 n = 0.6 0 2 n+ = 1.2 n = 1.6 0 2 n+ = 0.2 n = 1.6 Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time. 50 0 2 n+ = 0.6 n = 1.2 0 2 n+ = 0.4 n = 0.6 0 2 n+ = 1.2 n = 1.6 0 2 n+ = 0.2 n = 1.6 Figure 4: Examples of synthetic data traces generated in each scenario. Plots represent the opinion trajectories along time.
129. ### Recovering parameters 51 Figure 4: Examples of synthetic data traces

generated in each s
130. ### Real data: Reddit Comments score = upvotes Estimate position of

users and subreddits in opinion space Larger estimated distance of user from subreddit lower score of user on that subreddit → 52
131. ### Real data: Reddit Comments score = upvotes Estimate position of

users and subreddits in opinion space Larger estimated distance of user from subreddit lower score of user on that subreddit → 52
132. ### Call to Action Machine Learning is a treasure trove of

interesting building blocks VC dimension for approximation algorithms Automatic differentiation for agent- based models Repurpose it for your own goals Be curious, be bold: hack and invent! 53
133. ### G. Preti, G. De Francisci Morales, M. Riondato   “MaNIACS:

Approximate Mining of Frequent Subgraph Patterns through Sampling”   KDD 2021 + ACM TIST 2023 C. Monti, G. De Francisci Morales, F. Bonchi   “Learning Opinion Dynamics From Social Traces”   KDD 2020 C. Monti, M. Pangallo, G. De Francisci Morales, F. Bonchi   “On Learning Agent-Based Models from Data”   SciRep 2022 (accepted) + arXiv:2205.05052 54 [email protected] https://gdfm.me @gdfm7